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1 Dynamic partitioning in a transputer environment 1 
K. Dussa, B. Carlson, L. Dowdy, K.-H. Park 

April 1990 ACM SIGMETRICS Performance Evaluation Review , Proceedings of the 1990 
ACM SIGMETRICS conference on Measurement and modeling of computer 

systems, Volume 18 Issue 1 

Additional Information: full citation , abstract , references , citings , index 



Full text available: Wl pdf(1.37MB) 



terms 



Parallel programs are characterized by their speedup behavior. As more processors are 
allocated to a particular parallel program, the program (potentially) executes faster. 
However, there is often a point of diminishing returns, beyond which extra allocated 
processors cannot be used effectively. Extra processors would be better utilized by 
allocating them to another program. Thus, given a set of processors in a multiprocessor 
system, and a set of parallel programs, a partitioning problem na ... 

2 Manageability, availability, and performance in porcupine: a highly scalable, cluster- 
based mail service 

Yasushi Saito, Brian N. Bershad, Henry M. Levy 

August 2000 ACM Transactions on Computer Systems (TOCS), volume is issue 3 

Full text available: ' ^pdf(2.52 MB) Additional Information: full citation , abstract , references , index terms 

This paper describes the motivation, design and performance of Porcupine, a scalable mail 
server. The goal of Porcupine is to. provide a highly available and scalable electronic mail 
service using a large cluster of commodity PCs. We designed Porcupine to be easy to 
manage by emphasizing dynamic load balancing, automatic configuration, and graceful 
degradation in the presence of failures. Key to the system's manageability, availability, and 
performance is that sessions, data, and underlying ... 

Keywords: cluster, distributed systems, email, group membership protocol, load 
balancing, replication 
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W. Helrman, J. Dambre, C. Debaes, H. Thienpont, D. Stroobandt, J. Van Campenhout 
April 2005 Proceedings of the 2005 international workshop on System level 
interconnect prediction 
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Reconfigurable interconnection networks for distributed shared memory machines exploit 
properties of the workload dynamics that are not easily captured by statistical traffic 
models. Therefore, when designing such a network, one should make trade-offs based on 
full-system simulation for all viable workloads. It is however very time-consuming to do 
such simulations. In this paper, we present a technique that can predict the performance of 
a machine for different network parameters, based on the r ... 

Keywords: distributed shared-memory, interconnection network, prediction model, 
reconfiguration 



4 Performance evaluation and run time support: Specific scheduling support to nninimize Q 
the reconfiguration overhead of dynamically reconfigurable hardware 
Javier Resano, Daniel Mozos 

June 2004 Proceedings of the 41st annual conference on Design automation - Volume 
00 

Full text available: 'g| pdff 333.78 KB) Additional Information: full citation , abstract , references , index terms 

Dynamically Reconfigurable Hardware (DRHW) platforms present both flexibility and high 
performance. Hence, they can tackle the demanding requirements of current dynamic 
multimedia applications, especially for embedded systems where it is not affordable to 
include specific HW support for all the applications. However, DRHW reconfiguration latency 
represents a major drawback that can make the use of DRHW resources inefficient for 
highly dynamic applications. To alleviate this problem, we have deve ... 

Keywords: dynamic reconfigurable hardware, run-time scheduling 



5 Hotspot Prevention Through Runtime Reconfiguration in Network-On-Chip 
G. M. Link, N. Vijaykrishnan 

March 2005 Proceedings of the conference on Design, Automation and Test in Europe - 
Volume 1 

Full text available: pdff87.46 KB) Additional Information: full citation , abstract 

Many existing thermal managennent techniques focus on reducing the overall power 
consumption of the chip, and do not address location-specific temperature problems 
referred to as hotspots. We propose the use of dynamic runtime reconfiguration to shift the 
hotspot-inducing computation periodically and make the thermal profile more uniform. Our 
analysis shows that dynamic reconfiguration is an effective technique in reducing hotspots 
for NoCs. 

^ The impact of I/O on program behavior and parallel scheduling 
Emilia Rosti, Giuseppe Serazzi, Evgenia Smirni, Mark S. Squillante 

June 1998 ACM SIGMETRICS Performance Evaluation Review , Proceedings of the 

1998 ACM SIGMETRICS joint international conference on Measurement and 
modeling of computer systems, volume 26 issue i 

Additional Information: full citation , abstract , references , citings , index 



Full text available: ■p1 pdff1.40 MB) 
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In this paper we systematically examine various performance issues involved in the 
coordinated allocation of processor and disk resources in large-scale parallel computer 
systems. Models are formulated to investigate the I/O and computation behavior of parallel 
programs and workloads, and to analyze parallel scheduling policies under such workloads. 
These models are parameterized by measurements of parallel programs, and they are 
solved via analytic methods and simulation. Our results provide im ... 
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Interaction cost and shotgun profiling 

Brian A. Fields, Rastislav Bodik, Mark D. Hill, Chris J, Newburn 

September 2004 ACM Transactions on Architecture and Code Optimization (TACO), 

Volume 1 Issue 3 

Full text available: pdf(647.17 KB) Additional Information: full citation , abstract , references , index terms 

We observe that the challenges software optimizers and mlcroarchitects face every day boil 
down to a single problem: bottleneck analysis. A bottleneck is any event or resource that 
contributes to execution time, such as a critical cache miss or window stall. Tasks such as 
tuning processors for energy efficiency and finding the right loads to prefetch all require 
measuring the performance costs of bottlenecks. In the past, simple event counts were 
enough to find the important bottlenecks. Today, t ... 

Keywords: Performance analysis, critical path, modeling, profiling 



8 Manageability, availability and performance in Porcupine: a highly scalable, cluster- 
based nnail service 

YasushI Saito, Brian N. Bershad, Henry 1^. Levy 

December 1999 ACM SIGOPS Operating Systems Review , Proceedings of the 

seventeenth ACM symposium on Operating systems principles, volume 33 

Issue 5 

Additional Information: full citation , abstract , references , citings , index 



Full text available: Wi pdf(1.62 MB) 

''^-^^■^"^ terms 

This paper describes the motivation, design, and performance of Porcupine, a scalable mail 
server. The goal of Porcupine is to provide a highly available and scalable electronic mail 
service using a large cluster of commodity PCs. We designed Porcupine to be easy to 
manage by emphasizing dynamic load balancing, automatic configuration, and graceful 
degradation in the presence of failures. Key to the system's manageability, availability, and 
performance is that sessions, data, and underlying serv ... 

9 System-level power optimization: techniques and tools 
Luca Benini; Giovanni de Micheli 

April 2000 ACM Transactions on Design Automation of Electronic Systems (TODAES), 

Volume 5 Issue 2 

Additional Information: full citation , abstract , references , citings , index 



Full text available: m pdf(385.22 KB) 
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This tutorial surveys design methods for energy-efficient system-level design. We consider 
electronic sytems consisting of a hardware platform and software layers. We consider the 
three major constituents of hardware that consume energy, namely computation, 
communication, and storage units, and we review methods of reducing their energy 
consumption. We also study models for analyzing the energy cost of software, and methods 
for energy-efficient software design and compilation. This survery ... 

Disco: running comnnodity operating systems on scalable nnultiprocessors 

Edouard Bugnion, Scott Devine, Kinshuk Govil, Mendel Rosenblum 

November 1997 ACM Transactions on Computer Systems (TOCS), volume is issue 4 

Full text available- 'R pdf(400 76 KB) Additional Information: full citation , abstract , references , citings , index 
^ ' terms , review 

In this article we examine the problem of extending modern operating systems to run 
efficiently on large-scale shared-memory multiprocessors without a large implementation 
effort. Our approach brings back an idea popular in the 1970s: virtual machine monitors. 
We use virtual machines to run multiple commodity operating systems on a scalable 
multiprocessor. This solution addresses many of the challenges facing the system software 
for these machines. We demonstrate our approach with a prototy ... 
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11 Advances in software and hardware synthesis techniques for DSP applications: 




Efficient mapping of hierarchical trees on coarse-grain reconfigurable architectures 
F, Rivera, M. Sanchez-Elez, M. Fernandez, R. Hermida, N. Bagherzadeh 
September 2004 Proceedings of the 2nd lEEE/ACM/IFIP international conference on 
Hardware/ software codesign and system synthesis 

Full text available: pdf(316.12 KB) Additional Information: full citation , abstract , references , index terms 

Reconfigurable architectures have become increasingly important in recent years. In this 
paper we present an approach to the problem of executing 3D graphics Interactive 
applications onto these architectures. The hierarchical trees are usually Implemented to 
reduce the data processed, thereby diminishing the execution time. We have developed a 
mapping scheme that parallelizes the tree execution onto a SIMD reconfigurable 
architecture. This mapping scheme considerably reduces the time penalty cau ... 

Keywords: SIMD, computer graphics, hierarchical trees, multimedia, reconfigurable 
architectures 



12 Dynamic Voltage and Cache Reconfiguration for Low Power 
Andre C. Nacul, Tony Givargis 

February 2004 Proceedings of the conference on Design, automation and test in Europe 
- Volume 2 

Additional Information: full citation , abstract , index terms 

In this work, we propose a combined Dynamic Voltage Scaling (DVS) and Dynamic Cache 
Reconfiguration (DCR) online algorithm that dynamically adapts the processor speed (I.e., 
voltage) and the cache subsystem to the workload requirements for the purposes of saving 
energy. The workload is considered to be a set of tasks with real-time deadlines. Our online 
algorithm is invoked as part of the OS scheduler, which performs standard earliest deadline 
first (EDF) task scheduling first. Then, our online ... 

13 A Hybrid Prefetch Scheduling Heuristic to Minimize at Run-Time the Reconfiguration 

Overhead of Dynamically Reconfigurable Hardware 
Javier Resano, Daniel Mozos, Francky Catthoor 

March 2005 Proceedings of the conference on Design, Automation and Test in Europe - 
Volume 1 

Full text available: pdf(173.45 KB) Additional Information: full citation , abstract 

Due to the emergence of highly dynamic multimedia applications there is a need for flexible 
platforms and run-time scheduling support for embedded systems. Dynamic Reconfigurable 
Hardware (DRHW) is a promising candidate to provide this flexibility but, currently, not 
sufficient run-time scheduling support to deal with the run-time reconfigurations exists. 
Moreover, executing at run-time a complex scheduling heuristic to provide this support may 
generate an excessive run-time penalty. Hence, we h ... 

An analytical model for buffer hit rate prediction 
Yongli Xi, Patrick Martin, Wendy Powley 

November 2001 Proceedings of the 2001 conference of the Centre for Advanced Studies 
on Collaborative research 

Full text available: pdf(100.79 KB) Additional Information: full citation , abstract , references , index terms 

Of the many tuning parameters available in a database management system (DBMS), one 
of the most crucial to performance is the buffer pool size. Choosing an appropriate size. 
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however, can be a difficult tasl<. In this paper we present an analytical modeling approach 
to predicting the buffer pool hit rate that can be used to simplify the process of buffer pool 
sizing. A Markov Chain model is used to estimate the hit rate for buffer pools in IBM's DB2 
Universal Database. We present and experimental ... 

^5 An adaptive algorithm for low-power streaming multimedia processing 
A. Acquavlva, L. Benini, B. Ricco 

March 2001 Proceedings of the conference on Design, automation and test in Europe 

Full text available: ^ pdf( 135.54 KB) Additional Information: full citation , references , citings , index terms 




Routing and MAC: Versatile low power media access for wireless sensor networks 
Joseph Polastre, Jason Hill, David Culler 

November 2004 Proceedings of the 2nd international conference on Embedded 
networked sensor systems 

Full text available' Wi df(529 51 KB) Additional Information: full citation , abstract , references , citings , index 
u ex aval a e.-^p terms 

We propose <l>B-MAC</i>, a carrier sense media access protocol for wireless sensor 
networks that provides a flexible interface to obtain ultra low power operation, effective 
collision avoidance, and high channel utilization. To achieve low power operation, <i>B- 
MAC</i> employs an adaptive preamble sampling scheme to reduce duty cycle and 
minimize idle listening. <i>B-MAC</i> supports on-the-fly reconfiguration and provides 
bidirectional interfaces for system services t ... 

Keywords: communication interfaces, energy efficient operation, media access protocols, 
networking, reconfigurable protocols, wireless sensor networks 



Managing multi-configuration hardware via dynamic working set analysis 
Ashutosh S. Dhodapkar, James E. Smith 

May 2002 ACM SIGARCH Computer Architecture News, volume 30 issue 2 
Full text available: ig ^ pdf(1.16 MB) 9 Additional Information: full citation , abstract , references , citings , index 
P^ublisher Site i^HIls 

Microprocessors are designed to provide good average performance over a variety of 
workloads. This can lead to inefficiencies both in power and performance for individual 
programs and during individual phases within the same program. Microarchitectures with 
multi-configuration units (e.g. caches, predictors, instruction windows) are able to adapt 
dynamically to program behavior and enable/disable resources as needed. A key element of 
existing configuration algorithms is adjusting to program phas ... 

The impact of job arrival patterns on parallel scheduling 
Mark S. Squillante, David D. Yao, Li Zhang 

March 1999 ACM SIGMETRICS Performance Evaluation Review, volume 26 issue 4 
Full text available: ^ pdf(794.58 KB) Additional Information: full citation , abstract , citings , index terms 

In this paper we present an initial analysis of the job arrival patterns fronn a real parallel 
computing system and we develop a class of traffic models to characterize these arrival 
patterns. Our analysis of the job arrival data illustrates traffic patterns that exhibit heavy- 
tail behavior and other characteristics which are quite different from the arrival processes 
used in previous studies of parallel scheduling. We then investigate the impact of these 
arrival traffic patterns on the performan ... 
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Configuring buffer pools in DB2 UDB 
Xiaoyi Xu, Patrick Martin, Wendy Powley 

September 2002 Proceedings of the 2002 conference of the Centre for Advanced 
Studies on Collaborative research 

Full text available' ^ pdf(96 74 KB) Additional Information: full citation , abstract , references , citings , index 
^ ' terms 

Database Management Systems (DBMSs) use a main memory area as a buffer to reduce 
the number of disk accesses performed by a transaction. DB2 Universal Database divides 
the buffer area into a number of Independent buffer pools and each database object (table 
or index) is assigned to a specific buffer pool. The tasks of configuring the buffer pools, 
which defines the mapping of database objects to buffer pools and setting a size for each of 
the buffer pools, is crucial for achieving optimal perfor ... 

20 FAB: building distributed enterprise disk arrays from conrmnodity components 
Yasushi Saito, Svend Fr0lund, Alistair Veitch, Arif Merchant, Susan Spence 
October 2004 Proceedings of the 11th international conference on Architectural 

support for programming languages and operating systems, volume 32 , sis , 

39 Issue 5 , 5 , 11 

Full text available: ^ pdf(671.67 KB) Additional Information: full citation , abstract , references , index terms 

This paper describes the design, implementation, and evaluation of a Federated Array of 
Bricks (FAB), a distributed disk array that provides the reliability of traditional enterprise 
arrays with lower cost and better scalability. FAB is built from a collection of bricks, small 
storage appliances containing commodity disks, CPU, NVRAM, and network interface cards. 
FAB deploys a new majority-voting-based algorithm to replicate or erasure-code logical 
blocks across bricks and a reconfigurati ... 

Keywords: consensus, disk array, erasure coding, replication, storage, voting 
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